{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Gaussian Mixture Model Sine Curve\n\n\nThis example demonstrates the behavior of Gaussian mixture models fit on data\nthat was not sampled from a mixture of Gaussian random variables. The dataset\nis formed by 100 points loosely spaced following a noisy sine curve. There is\ntherefore no ground truth value for the number of Gaussian components.\n\nThe first model is a classical Gaussian Mixture Model with 10 components fit\nwith the Expectation-Maximization algorithm.\n\nThe second model is a Bayesian Gaussian Mixture Model with a Dirichlet process\nprior fit with variational inference. The low value of the concentration prior\nmakes the model favor a lower number of active components. This models\n\"decides\" to focus its modeling power on the big picture of the structure of\nthe dataset: groups of points with alternating directions modeled by\nnon-diagonal covariance matrices. Those alternating directions roughly capture\nthe alternating nature of the original sine signal.\n\nThe third model is also a Bayesian Gaussian mixture model with a Dirichlet\nprocess prior but this time the value of the concentration prior is higher\ngiving the model more liberty to model the fine-grained structure of the data.\nThe result is a mixture with a larger number of active components that is\nsimilar to the first model where we arbitrarily decided to fix the number of\ncomponents to 10.\n\nWhich model is the best is a matter of subjective judgement: do we want to\nfavor models that only capture the big picture to summarize and explain most of\nthe structure of the data while ignoring the details or do we prefer models\nthat closely follow the high density regions of the signal?\n\nThe last two panels show how we can sample from the last two models. The\nresulting samples distributions do not look exactly like the original data\ndistribution. The difference primarily stems from the approximation error we\nmade by using a model that assumes that the data was generated by a finite\nnumber of Gaussian components instead of a continuous noisy sine curve.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import itertools\n\nimport numpy as np\nfrom scipy import linalg\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\n\nfrom sklearn import mixture\n\nprint(__doc__)\n\ncolor_iter = itertools.cycle(['navy', 'c', 'cornflowerblue', 'gold',\n                              'darkorange'])\n\n\ndef plot_results(X, Y, means, covariances, index, title):\n    splot = plt.subplot(5, 1, 1 + index)\n    for i, (mean, covar, color) in enumerate(zip(\n            means, covariances, color_iter)):\n        v, w = linalg.eigh(covar)\n        v = 2. * np.sqrt(2.) * np.sqrt(v)\n        u = w[0] / linalg.norm(w[0])\n        # as the DP will not use every component it has access to\n        # unless it needs it, we shouldn't plot the redundant\n        # components.\n        if not np.any(Y == i):\n            continue\n        plt.scatter(X[Y == i, 0], X[Y == i, 1], .8, color=color)\n\n        # Plot an ellipse to show the Gaussian component\n        angle = np.arctan(u[1] / u[0])\n        angle = 180. * angle / np.pi  # convert to degrees\n        ell = mpl.patches.Ellipse(mean, v[0], v[1], 180. + angle, color=color)\n        ell.set_clip_box(splot.bbox)\n        ell.set_alpha(0.5)\n        splot.add_artist(ell)\n\n    plt.xlim(-6., 4. * np.pi - 6.)\n    plt.ylim(-5., 5.)\n    plt.title(title)\n    plt.xticks(())\n    plt.yticks(())\n\n\ndef plot_samples(X, Y, n_components, index, title):\n    plt.subplot(5, 1, 4 + index)\n    for i, color in zip(range(n_components), color_iter):\n        # as the DP will not use every component it has access to\n        # unless it needs it, we shouldn't plot the redundant\n        # components.\n        if not np.any(Y == i):\n            continue\n        plt.scatter(X[Y == i, 0], X[Y == i, 1], .8, color=color)\n\n    plt.xlim(-6., 4. * np.pi - 6.)\n    plt.ylim(-5., 5.)\n    plt.title(title)\n    plt.xticks(())\n    plt.yticks(())\n\n\n# Parameters\nn_samples = 100\n\n# Generate random sample following a sine curve\nnp.random.seed(0)\nX = np.zeros((n_samples, 2))\nstep = 4. * np.pi / n_samples\n\nfor i in range(X.shape[0]):\n    x = i * step - 6.\n    X[i, 0] = x + np.random.normal(0, 0.1)\n    X[i, 1] = 3. * (np.sin(x) + np.random.normal(0, .2))\n\nplt.figure(figsize=(10, 10))\nplt.subplots_adjust(bottom=.04, top=0.95, hspace=.2, wspace=.05,\n                    left=.03, right=.97)\n\n# Fit a Gaussian mixture with EM using ten components\ngmm = mixture.GaussianMixture(n_components=10, covariance_type='full',\n                              max_iter=100).fit(X)\nplot_results(X, gmm.predict(X), gmm.means_, gmm.covariances_, 0,\n             'Expectation-maximization')\n\ndpgmm = mixture.BayesianGaussianMixture(\n    n_components=10, covariance_type='full', weight_concentration_prior=1e-2,\n    weight_concentration_prior_type='dirichlet_process',\n    mean_precision_prior=1e-2, covariance_prior=1e0 * np.eye(2),\n    init_params=\"random\", max_iter=100, random_state=2).fit(X)\nplot_results(X, dpgmm.predict(X), dpgmm.means_, dpgmm.covariances_, 1,\n             \"Bayesian Gaussian mixture models with a Dirichlet process prior \"\n             r\"for $\\gamma_0=0.01$.\")\n\nX_s, y_s = dpgmm.sample(n_samples=2000)\nplot_samples(X_s, y_s, dpgmm.n_components, 0,\n             \"Gaussian mixture with a Dirichlet process prior \"\n             r\"for $\\gamma_0=0.01$ sampled with $2000$ samples.\")\n\ndpgmm = mixture.BayesianGaussianMixture(\n    n_components=10, covariance_type='full', weight_concentration_prior=1e+2,\n    weight_concentration_prior_type='dirichlet_process',\n    mean_precision_prior=1e-2, covariance_prior=1e0 * np.eye(2),\n    init_params=\"kmeans\", max_iter=100, random_state=2).fit(X)\nplot_results(X, dpgmm.predict(X), dpgmm.means_, dpgmm.covariances_, 2,\n             \"Bayesian Gaussian mixture models with a Dirichlet process prior \"\n             r\"for $\\gamma_0=100$\")\n\nX_s, y_s = dpgmm.sample(n_samples=2000)\nplot_samples(X_s, y_s, dpgmm.n_components, 1,\n             \"Gaussian mixture with a Dirichlet process prior \"\n             r\"for $\\gamma_0=100$ sampled with $2000$ samples.\")\n\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}